CODE: A Moving-Window-Based Framework for Detecting Concept Drift in Software Defect Prediction
نویسندگان
چکیده
Concept drift (CD) refers to data distributions that may vary after a minimum stable period. CD negatively influences models’ performance of software defect prediction (SDP) trained on past datasets when applied the new datasets. Based previous studies SDP, it is confirmed accuracy models affected due changes in distributions. Moreover, cross-version (CV) are naturally asymmetric nature their class imbalance. In this paper, moving window-based concept-drift detection (CODE) framework proposed detect chronologically defective and investigate feasibility alleviating from data. The CODE consists four steps, which first pre-processes forms CV chronological data, second constructs models, third calculates test statistics, fourth provides hypothesis-test-based method. prior observed an effort make more symmetric, class-rebalancing techniques utilized, improves models. ability demonstrated by conducting experiments 36 versions 10 projects. Some key findings are: (1) Up 50% chronological-defect drift-prone while applying most popular classifiers used SDP literature. (2) had positive impact for CVDP correctly classifying modules detected up 31% resampled
منابع مشابه
Exponentially weighted moving average charts for detecting concept drift
.Classifying streaming data requires the development of methods which are computationally efficient and able to cope with changes in the underlying distribution of the stream, a phenomenon known in the literature as concept drift. We propose a new method for detecting concept drift which uses an Exponentially Weighted Moving Average (EWMA) chart to monitor the misclassification rate of an strea...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملA Framework for Defect Prediction in Specific Software Project Contexts
Software defect prediction has drawn the attention of many researchers in empirical software engineering and software maintenance due to its importance in providing quality estimates and to identify the needs for improvement from project management perspective. However, most defect prediction studies seem valid primarily in a particular context and little concern is given on how to find out whi...
متن کاملA Comparison Framework of Classification Models for Software Defect Prediction
A software defect is an error, failure, or fault in a software [1], that produces an incorrect or unexpected result, or causes it to behave in unintended ways. It is a deficiency in a software product that causes it to perform unexpectedly [2]. Software defects or software faults are expensive in quality and cost. Moreover, the cost of capturing and correcting defects is one of the most expensi...
متن کاملSoftware defect prediction using static code metrics : formulating a methodology
Software defect prediction is motivated by the huge costs incurred as a result of software failures. In an effort to reduce these costs, researchers have been utilising software metrics to try and build predictive models capable of locating the most defect-prone parts of a system. These areas can then be subject to some form of further analysis, such as a manual code review. It is hoped that su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Symmetry
سال: 2022
ISSN: ['0865-4824', '2226-1877']
DOI: https://doi.org/10.3390/sym14122508